An Example of Hierarchical Ordination

This document describes fitting a hierarchical ordination to data, including code and the nasty details that are needed.

The data is included in the ‘mvabund’ ‘R’-package, but was originally collected by Gibb et al. It is the observed abundances of 41 ant species at 30 sites in Australia, along with 7 environmental variables, and 5 traits. The ecological question is how do the environmental variables and traits affect the community composition, including whether they interact. The methodological question is how does hierarchical ordination help.

The abundances, traits and environment are each stored in a different matrix. First we load the data and set some constants:

library(nimble)
library(nimbleHMC)
library(mvabund)
library(coda)

data("antTraits")
Y <- antTraits$abund # abundance data of sites by species
X <- scale(antTraits$env) # environmental variables of sites by predictors
TR <- scale(model.matrix(~0+.,antTraits$traits)) # species traits

NSites <- nrow(Y) # number of sites
NSpecies <- ncol(Y) # number of species
NTraits <- ncol(TR) # number of traits
NEnv <- ncol(X) # number of environmental predictors

# Create data lists for Nimble
dat <- list(Y = Y, X = X, TR = TR)
consts <- list(NSites = NSites, NEnv = NEnv, NTraits = NTraits, NSpecies = NSpecies)

The Model

The data are counts of each species so we assume they follow a Poisson distribution with a log link function, as we would do in a standard genralised linear model. We assume each species has the same mean abundance (i.e. a different \(\beta_{0j}\) for each species \(j\)), and model the rest of the variation with the hierarchical ordination. This gives the following model for the log of the mean abundance (\(\eta_{ij}\)):

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]

As with a normal ordination, \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are the site scores and species loadings. But here we assume that they each have a variance of 1, so that \(\boldsymbol{\Sigma}\) is the variance of the latent variables: it will typically be a diagonal matrix, and if any of the terms on the diagonal are close to 0, this suggests that latent variable has (almost) no effect. It thus provides a straightforward summary of the importance of that latent variable.

If we stopped here, this would be a standard GLLVM. But we want to model both \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\), i.e. go from modelling model groups of species responding in similar ways to sites to modelling how the traits of the species affect their responses to the environment.

As a simplification, we can think about this as simply a regression of \(\boldsymbol{z}\) (the site effects) against the environmental variables and \(\boldsymbol{\gamma}_j\) against the traits. In reality, it is more complicated, because \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are estimated in the model, so their uncertainty needs to be included.

The mathematical details are hidden away here, for those who want them.

We denote the abundance at site \(i = 1 \dots n\) for species \(j = 1 \dots p\) as \(Y_{ij}\), or as a matrix as \(\boldsymbol{Y}\). The environmental predictors are \(X_{ik}\) for the \(k = 1 \ldots K\) predictors (or \(\boldsymbol{X}\) as a matrix), and \(t = 1\ldots T\) traits as \(\boldsymbol{R}\). We then assume

\[Y_{ij} \sim Pois(\lambda_{ij}),\]

with

\[log(\lambda_{ij}) = \eta_{ij}.\]

So that \(\eta_{ij}\) is the linear predictor, which we will model with the hierarchical ordination:

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j \]

We then model \(\boldsymbol{z}_i\) (as in van der Veen et al (2023)) and \(\boldsymbol{\gamma}_j\) hierarchically:

\[ \boldsymbol{z}_i = \boldsymbol{B}^\top\boldsymbol{x}_i + \boldsymbol{\epsilon}_i \] and

\[ \boldsymbol{\gamma}_j = \boldsymbol{\omega}^\top\boldsymbol{R}_{j} + \boldsymbol{\varepsilon}_j \] where

  • \(x_{ik}\) is the \(k^{th}\) predictor (i.e. environmental effect) at site \(i\)
  • \(\boldsymbol{B}\) with entry \(b_{kq} \sim \mathcal{N}(0,1)\) is the effect of the \(k^{th}\) predictor on the site score for the \(q^{th} = 1\ldots d\) latent variable
  • \(\boldsymbol{\epsilon}_i\) with entry \(\epsilon_{iq} \sim \mathcal{N}(0, \sigma^2_q)\) is a vector of residuals for the unexplained part of the site score
  • \(R_{jt}\) is the \(t^{th}\) predictor (i.e. trait) for species \(j\)
  • \(\boldsymbol{\omega}_t\) with entry \(\omega_{tq}\) is the effect of the \(t^{th}\) trait on the species loading for the \(q^{th}\) latent variable
  • \(\boldsymbol{\varepsilon}_j\) with entry \(\varepsilon_{jq} \sim \mathcal{N}(0, \delta^2_q)\) is a vector of residuals for the unexplained part of the species loading
Note that the predictors are all standardized to zero mean, variance one. We additionally place exponential priors on the scale parameters \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\) with rate parameter one.

Implementation

We fit the model with the \(\texttt{Nimble}\) package. We start with a single dimension for simplicity, so that we can show the steps needed.

One dimensional ordination

The fitting process needs the mode, and the code to run it.

The Nimble code that describes the model is here
OneLV.nim <- nimbleCode({
  for (i in 1:NSites) {
    for (j in 1:NSpecies) {
# These three lines specify the model:   
      Y[i, j] ~ dpois(lambda[i, j]) # Likelihood: Y follows a Poisson distribution
      log(lambda[i, j]) <- eta[i, j] # Specify the log link function
      eta[i,j] <- beta0[j] + gamma[j]*sd.LV*z[i] # linear predictor: species + LV
    }
# Site scores: regression against environmental effects
#    the Bs are the coefficients of the environmental effects
      epsilonSTAR[i] ~ dnorm(0, sd = sd.Site) # Residual site effect
      XB[i] <- inprod(X[i, 1:NEnv],BSTAR[1:NEnv])
      zSTAR[i] <- XB[i] + epsilonSTAR[i]
  }
  
  for(j in 1:NSpecies) {
# Species effects: regression against trait effects
#    The Os are the trait effects.
      varepsilonSTAR[j] ~ dnorm(0, sd = sd.Species) # Residual
      omegaTR[j] <- inprod(TR[j, 1:NTraits],OSTAR[1:NTraits])
      gammaSTAR[j] <- omegaTR[j] + varepsilonSTAR[j]
      beta0[j] ~ dnorm(0, sd = 1) # Species means
  }
  
# Here we standardise z and gamma, so the both have a variance of 1.
#   The standard deviation of the latent variable is sd.LV.
    zmu <- mean(zSTAR[1:NSites])
    gammamu<- mean(gammaSTAR[1:NSpecies])
    StdDev.z <- sqrt(mean((zSTAR[1:NSites] - zmu)^2))
    StdDev.gamma <- sqrt(mean((gammaSTAR[1:NSpecies] - gammamu)^2)) 
    z[1:NSites] <- (zSTAR[1:NSites]-zmu)/StdDev.z
    gamma[1:NSpecies] <- (gammaSTAR[1:NSpecies]-gammamu)/StdDev.gamma
    
# Scale other parameters
    epsmu <- mean(epsilonSTAR[1:NSites])
    varepsmu <- mean(varepsilonSTAR[1:NSpecies])
    
    epsilon[1:NSites] <- (epsilonSTAR[1:NSites]-epsmu)/StdDev.z
    B[1:NEnv] <-  BSTAR[1:NEnv]/StdDev.z
    varepsilon[1:NSpecies] <- (varepsilonSTAR[1:NSpecies]-varepsmu)/StdDev.gamma
    O[1:NTraits] <-  OSTAR[1:NTraits]/StdDev.gamma

    # priors for scales
    sd.Site ~ dexp(1)
    sd.Species ~ dexp(1)
    sd.LV ~ dexp(1)

    for(k in 1:NEnv){
      BSTAR[k] ~ dnorm(0, sd = 1)  
    }
    for(tr in 1:NTraits){
      OSTAR[tr] ~ dnorm(0, sd = 1)
    }
})
The MCMC needs initial values to start from: the function to generate these is here
inits<-function(consts){
  BSTAR = rnorm(consts$NEnv)
  OSTAR = rnorm(consts$NTraits)
  varepsilonSTAR = rnorm(consts$NSpecies)
  epsilonSTAR = rnorm(consts$NSites)
  list(
    BSTAR = BSTAR,
    OSTAR = OSTAR,
    epsilonSTAR = epsilonSTAR,
    varepsilonSTAR = varepsilonSTAR,
    sd.Site = rexp(1),
    sd.Species = rexp(1),
    beta0 = rnorm(consts$NSpecies),
    sd.LV = rexp(1)
  )
}

We parallelise running the MCMC so that each chain is run on a different processor. This means we need a function to run one chain, and then another to run all of the chains together.

Functions to run the code are here.
# Function to run one chain: it can be done with HMC or other MCMC algorithms.
RunOneChain <- function(seed, dat, code, inits, consts, ToMonitor=NULL, 
                        Nburn=5e3, NIter=5.5e4, Nthin=10, HMC = FALSE, ...) {
  require(nimble)
  require(nimbleHMC)
  AllSamplers <-c('epsilonSTAR', 'varepsilonSTAR', 'beta0', 'OSTAR', 'BSTAR',
                  'sd.Site', 'sd.Species', 'sd.LV')
  HMCsamplers <-c('epsilonSTAR', 'varepsilonSTAR','BSTAR', 'OSTAR', 'beta0', 
                  'sd.Site', 'sd.Species', 'LV', 'sd.LV')
  
  if(is.null(ToMonitor)) {
    ToMonitor <- c("beta0", "sd.Species", "sd.Site", "sd.LV", "B", "O", 
                   "epsilon", "varepsilon", "gamma", "z")
  }
#  Mons <- c("beta0","sd.Species","sd.Site","sd.LV","B","OSTAR", "epsilonSTAR","varepsilonSTAR")
  mod <- nimbleModel(code = code, name = "HO", constants = consts, 
                     inits = inits(consts), data = dat, buildDerivs = TRUE)
  model <- compileNimble(mod)
  if(HMC) { # Do HMC on everything
    conf <- configureHMC(model, monitors = ToMonitor, print = FALSE, 
                         control=list(nwarmup=Nburn, kappa=0.95))
    # Use a slice everything that not being HMCed
    Slicesamplers <-AllSamplers[!AllSamplers%in%HMCsamplers] 
    if(length(Slicesamplers)>1) {
      conf$removeSamplers(Slicesamplers)
      sapply(Slicesamplers, conf$addSampler, type = "AF_slice")
    }
  } else {
    conf <- configureMCMC(model, monitors = ToMonitor, print = FALSE)
    # Change the samplers to block update, using a nifty extension of the slice sampler
    Slicesamplers <-c('epsilon', 'varepsilon')
    conf$removeSamplers(Slicesamplers)
    conf$addSampler(target = Slicesamplers, type = "AF_slice")
  }
#  nimbleOptions(MCMCusePredictiveDependenciesInCalculations = FALSE)
  mcmc <- buildMCMC(conf)
  cmcmc <- compileNimble(mcmc, project = model)
  res <- runMCMC(cmcmc,  niter=NIter, nburnin = Nburn, thin=Nthin, 
                 nchains = 1, samplesAsCodaMCMC = TRUE, ...)
  return(res)
}

# Function to run parallel chains
ParaNimble <- function(NChains, ...) {
  require(parallel)
  nimble_cluster <- makeCluster(4)
  
  samples <- parLapply(cl = nimble_cluster, X = 1:4,...)
  stopCluster(nimble_cluster)

  # Name the chains in the list
  chains <- setNames(samples,paste0("chain", 1:length(samples)))
  chains
}

These models are notorious for being unidentifiable, i.e. you can get the same mean abundances from different combinations of the parameters. We therefore have to make some adjustments to the model: some of this is done in the model fitting, but for others it is easier to do it afterwards.

The details of what we do to make this identifiable are here
  • First, we standardise \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) to unit variance per latent variable to prevent scale invariance.
  • At this point the model is still invariant to sign switching: because \(z_{i}\gamma_{j} = (-z_{i})(-\gamma_{j})\) the MCMC algorithm can (and does) swith signs during the MCMC run. We could solve it by placing a truncated normal prior on the main diagonal entries of \(\boldsymbol{B}\) or \(\boldsymbol{\omega}\), but that still results in a bimodal posterior distribution for other coefficients. Instead we leave this as it is during the MCMC and post-process the chains. We chose one variable, i.e. one \(z_{i}\) or \(\gamma_{j}\), to fix to be positive, and switch all of the other variables. If we choose a variable that should only be of one sign, this will fix the problem. see below for the details of how we do this here.
  • Note however, that for \(d \gt p\) the model is still invariant to sign switches, so if the number of latent variables is greater than the number of predictors, additional constraints need to be imposed on the diagonal of \(\boldsymbol{\omega}\) or on the \(\boldsymbol{\epsilon}_i\)’s or \(\boldsymbol{\epsilon}_j\)’s instead. Obviously we do not need to do this in this model, but is included below for the more general case.

The functions to swap signs are below. We want to make one parameter positive, so ideally we want to do this to a parameter that doesn’t want to swap signs. We identify this in an ad hoc way: for each chain for every parameter we calculate the proportion of iterations where the sign is positive, and then for every parameter we calculate the variance in the proportion. If a parameter is centred around 0, then the mean proportion will be about 0.5, and will not vary much, whereas if it is some way from 0 the mean will be close to 0 or 1, and the variance will be high.

An alternative approach would be to look at the largest \(|p - 0.5|\). There may be better alternatives too.

# Utility Function to get logical indicators for if names contains a string in v
GetInds <- function(v, names) {
  if(length(v)==1) {
    res <- grep(v, names)
  } else {
    res <- c(unlist(sapply(v, grep, x=names)))
  }
  res
}

# Function to swap signs of all variables varinds to have same sign as vartosign.
#    Might have to change this to make sure different chains stick to the same sign
ReSignChain <- function(chain, varinds, vartosign) {
  #    MeanSign <- sign(mean(chain[  ,s])) # Might need this
  res <- t(apply(chain, 1, function(x, vs, vi) {
    Names <- names(x)[vi]
    if(any(grepl(",", Names))) {
      lvind <- gsub(".*, ", "", Names)
    } else {
      lvind <- seq_along(vs)
    }
    x[vi] <- x[vi]*sign(x[vs[lvind]])
    x
  }, vs=vartosign, vi=varinds))
  as.mcmc(res)
}



# Function to post-process chains to swap signs.
postProcess <- function(Chains, VarsToProcess, VarToSign=NULL, print=FALSE) {
  if(is.null(VarToSign)) VarToSign <- VarsToProcess
  SignInd <- GetInds(VarToSign, colnames(Chains[[1]]))
  # Get indicators for all variables to have their signs changed
  ProcessInds <- GetInds(VarsToProcess, names = colnames(Chains[[1]]))
  
  # Check if > 1 LV
  SeveralLVs <- any(grepl(",", colnames(Chains[[1]])[SignInd]))
  
  # Calculate variance of mean of indicator of sign: 
  #    hopefully largest is variable with most sign swapping (i.e. )
  Signs <- as.data.frame(lapply(Chains, function(mat, ind) {
    colMeans(mat[,ind]>0)
  }, ind=SignInd))
    VarSign <- apply(Signs, 1, var)
  
  if(SeveralLVs) {
    LV <- gsub(".*, ", "", colnames(Chains[[1]])[SignInd])
    #Chose variables who's sign will be used to swap other signs
    VarsToSwapBy <- sapply(unique(LV), function(lv, vs) {
      vv <- vs[grep(lv, names(vs))]
      nm <- names(which(vv==max(vv)))
      if(length(nm)>1) nm <- nm[1] # probably something more sophisticated is better
      nm
    }, vs = VarSign, simplify = TRUE)
    chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarsToSwapBy, varinds=ProcessInds)
    
  } else { # only 1 LV
    #Chose variables who's sign will be used to swap other signs
    VarToSwapBy <- names(which(VarSign==max(VarSign)))[1]
    if(print) message(paste0("Swapping by ", VarToSwapBy))
#    chains.sgn <- lapply(Chains, ReSignChain, varNames=VarsToProcess, sign.var=ind)
    chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarToSwapBy, varinds=ProcessInds)
  }
  as.mcmc.list(chains.sgn)
}

Finally, we can run the MCMC.

chains.unsgn <- ParaNimble(4, fun = RunOneChain,
                       dat = dat,
                       code = OneLV.nim,
                       inits = inits, HMC = TRUE,
#                       Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
                       Nburn=1e3, NIter=2e3, Nthin=1, # for a big HMC run
                       consts = consts)

# post-process chains for sign-swapping
VarsToProcess <- c("^B", "^O", "^epsilon", "^varepsilon", "^z", "^gamma")
VarsToSign <- c("^B")
chains <- postProcess(chains.unsgn, VarsToProcess=VarsToProcess, 
                      VarToSign = VarsToSign, print = TRUE)
summ.HO = summary(chains)

Results

There are a lot of parameters, so a lot of results to look at. Here we concentrate on \(\boldsymbol{B}\), \(\boldsymbol{\omega}\), \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\), and a latent variable-plot.

We can look at the chains, to see that they have converged.

First the environmental effects

library(basicMCMCplots)
chainsPlot(chains, var = c("B"), legend = F,traceplot=T)

Then the trait effects

chainsPlot(chains, var = c("O"), legend = F,traceplot=TRUE)

These look quite OK (after post-processing). Now we can create a plot of the LV-scores against their indices to inspect the results.

We use a plotting function, which is hidden here
PlotPost <- function(var, summ, varnames=NULL, ...) {
  vars <- grep(var, rownames(summ$statistics))
  if(is.null(varnames)) varnames <- rownames(summ$statistics)[grep(var, rownames(summ$statistics))]
  if(length(varnames)!=length(vars)) 
    stop(paste0("Number of variable names, ", length(varnames), "not the same as number of variables, ",
                length(vars)))
  
  plot(summ$statistics[vars,"Mean"], 1:length(vars), 
       xlim=range(summ$quantiles[vars,]), yaxt="n", ...)
  segments(summ$quantiles[vars,"2.5%"], 1:length(vars), summ$quantiles[vars,"97.5%"], 1:length(vars))
  segments(summ$quantiles[vars,"25%"], 1:length(vars), summ$quantiles[vars,"75%"], 1:length(vars), lwd=3)
  abline(v=0, lty=3)
  axis(2, at=1:length(vars), labels=varnames, las=1)
}
par(mar=c(4.1,4.5,1,1))
PlotPost(var="B", summ=summ.HO, varnames = gsub("\\.", "\n", colnames(X)), 
         xlab="Environmental Effect", ylab="")

We can see that both Coarse woody debris and canopy cover have positive effects on the latent variable, so ants that have a positive species score will be more abundant in areas with more canopy cover and CWD.

We can also look at the trait effects. The bottom line is that there is too much uncertainty to say anything.

par(mar=c(4.1, 7,1,1))
PlotPost(var="O", summ=summ.HO, varnames = gsub("\\.", "\n", colnames(TR)), 
         xlab="Trait Effect", ylab="")

We can also calculate the full distributions of site and species scores, and plot them.

We use a function, which is hidden here
ExtractScore <- function(mcmc.lst, betaname, epsname, XX, summ=FALSE, 
                         scorename="score", add=TRUE) {
  betaind <- grep(paste0('^', betaname), colnames(mcmc.lst[[1]]))
  epsind <- grep(paste0('^', epsname), colnames(mcmc.lst[[1]]))
  
  CalcScore <- function(v, betaind, epsind, XX, scorename) {
    res <- XX%*%v[betaind] + v[epsind]
    #    names(res) <- names(v[epsind])
    names(res) <- gsub(".*\\[", paste0(scorename, "["), names(v[epsind]))
    res
  }
  
  PostScore <- function(mc, LV=0, scorename, ...) {
    if(ncol(XX)==length(betaind)) {
      as.mcmc(t(apply(mc, 1, CalcScore, scorename=scorename, ...)))
    } else {
      
      LV.beta <- gsub(".*, |]", "", colnames(mc)[betaind])
      LV.eps <- gsub(".*, |]", "", colnames(mc)[epsind])
      LVs <- unique(LV.beta)
      
      res <-  sapply(LVs, function(lv, mc, betaind, epsind, l.be, l.e, XX, scorename) {
        bInd <- betaind[l.be==lv]
        eInd <- epsind[l.e==lv]
        
        apply(mc, 1, CalcScore, betaind=bInd, epsind=eInd, XX=XX, scorename=scorename)
      },  mc = mcmc.lst[[1]], betaind=betaind, epsind=epsind, l.be=LV.beta, 
      l.e=LV.eps, XX=XX, scorename=scorename, simplify=FALSE)
      
      res.mcmc <- as.mcmc(t(do.call(rbind, res)))
      res.mcmc
    }
  }
  
  
  #  score <- as.mcmc.list(lapply(mcmc.lst, PostScore, betaind=betaind, epsind=epsind, XX=XX))
  if(add) {
    score <- as.mcmc.list(lapply(mcmc.lst, function(lst, scorename, ...) {
      #        ps <- PostScore(lst, betaind=betaind, epsind=epsind, XX=XX)
      ps <- PostScore(lst, scorename=scorename, ...)
      as.mcmc(cbind(lst, ps))
    }, betaind=betaind, epsind=epsind, XX=XX, scorename=scorename))
  } else {
    score <- as.mcmc.list(lapply(mcmc.lst, PostScore, betaind=betaind, 
                                 epsind=epsind, XX=XX, scorename=scorename))
  }
  if(summ) score <- summary(score)
  score
}

We see that, well, there is variation, so the trait effects are not uncertain because the species scores are uncertain: it is because their effects are small.

par(mfrow=c(2,1), mar=c(2,12,4,1))
PlotPost("^z", summ=summ.HO, varnames = NULL, ylab="", xlab="Latent variable 1", main = "Sites")
PlotPost("^gamma", summ=summ.HO, varnames = NULL, ylab="", xlab="Latent variable 1", main = "Species")

More dimensions

More than one dimension requires adding extra identifiability constraints. Now we have two or more latent variables, we also have to worry about rotating them.

For those who are interested, this is how we deal with the rotations

There are various ways to place the constraints, some more numerically stable than others. Generally, we note that the model on the link scale is similar to existing matrix decompositions, so that much on the required constraints can be learned from those.

To prevent rotational invariance, we require \(\boldsymbol{B}\) and \(\boldsymbol{\omega}\) to have zeros above the main diagonal. van der Veen et al (2023) instead assumed \(\boldsymbol{B}\) to be a semi-orthogonal matrix as they encountered numerical instability with the constraint that we impose here. However, their constraint would require sampling on constrained parameter spaces, which is difficult, while this constraint formulation allows to use standard out-of-the-box Markov Chain Monte carlo samplers (also, it seems to work fine here). Note, that when $d p,t $ the model is again rotationally invariant, and additional constraints (e.g., order constraints for the latent variables as in a varimax rotation) will need to be added.
The new model code is almost the same as before
# Update our constants from before with the new number of LVs, rest remains the same
# Model code
HO.nim <- nimbleCode({
  for (i in 1:NSites) {
    for (j in 1:NSpecies) {
      eta[i,j] <- beta0[j] + sum(gamma[j,1:nLVs]*sd.LV[1:nLVs]*z[i,1:nLVs])
      log(lambda[i, j]) <- eta[i, j]
      Y[i, j] ~ dpois(lambda[i, j])
    }      
    for (l in 1:nLVs) {
      XB[i, l] <- sum(X[i, 1:NEnv]*BSTAR[1:NEnv, l])
      epsilonSTAR[i,l] ~ dnorm(0,sd.Site[l])#Residual
      zSTAR[i,l] <- XB[i,l] + epsilonSTAR[i,l]
    }
  }
  
  for(j in 1:NSpecies) {
    for (l in 1:nLVs) {
      omegaTR[j, l] <- sum(TR[j, 1:NTraits]*OSTAR[1:NTraits, l])
      varepsilonSTAR[j,l] ~ dnorm(0,sd.Species[l]) # Residual
      gammaSTAR[j,l] <- omegaTR[j,l] + varepsilonSTAR[j,l]
    }
    
    beta0[j] ~ dnorm(0, sd=1)
  }
  # Constraints to 0 on upper diagonal
  # stole some code from Boral for this - thanks Francis
  for(l in 1:(nLVs-1)) { 
    for(ll in (l+1):(nLVs)) {
      BSTAR[l,ll] <- 0 
    } 
  }

    
  for(l in 1:nLVs) { 
    # diagonal elements
    BSTAR[l,l] ~ dnorm(0,sd=1)

    ## standardizing z and gamma
    zmu[l] <- mean(zSTAR[1:NSites,l])
    StdDev.z[l] <- sqrt(mean((zSTAR[1:NSites,l] - zmu[l])^2))
    z[1:NSites,l] <- (zSTAR[1:NSites,l]-zmu[l])/StdDev.z[l]
    
    gammamu[l] <- mean(gammaSTAR[1:NSpecies,l])
    StdDev.gamma[l] <- sqrt(mean((gammaSTAR[1:NSpecies,l] - gammamu[l])^2)) 
    gamma[1:NSpecies,l] <- (gammaSTAR[1:NSpecies,l]-gammamu[l])/StdDev.gamma[l]
    
  # Scale other parameters
    epsmu[l] <- mean(epsilonSTAR[1:NSites,l])
    varepsmu[l] <- mean(varepsilonSTAR[1:NSpecies,l])
    
    epsilon[1:NSites,l] <- (epsilonSTAR[1:NSites,l]-epsmu[l])/StdDev.z[l]
    B[1:NEnv,l] <-  BSTAR[1:NEnv,l]/StdDev.z[l]
    varepsilon[1:NSpecies,l] <- (varepsilonSTAR[1:NSpecies,l]-varepsmu[l])/StdDev.gamma[l]
    O[1:NTraits,l] <-  OSTAR[1:NTraits,l]/StdDev.gamma[l]
  
    # priors for scales
    sd.Site[l] ~ dexp(1)
    sd.Species[l] ~ dexp(1)
    sd.LV[l] ~ dexp(1)
    } 
  
  ## Free lower diagonals
  for(l in 2:nLVs) { 
    for(ll in 1:(l-1)) { 
      BSTAR[l,ll] ~ dnorm(0,sd=1)
      } 
    }
  for(l in (nLVs+1):NEnv) { 
    for(ll in 1:(nLVs)) { 
      BSTAR[l,ll] ~dnorm(0,1) 
    } 
  } ## All other elements

  # Prior for trait effects
    for(tr in 1:NTraits){
    for(l in 1:nLVs){
      OSTAR[tr,l] ~dnorm(0,1)
    }
  }
})
We create a new function to simulate starting values from the prior distributions, which we can hide away
inits<-function(consts){
  BSTAR = matrix(rnorm(consts$nLVs*consts$NEnv),ncol=consts$nLVs)
  BSTAR[upper.tri(BSTAR)] = 0
  OSTAR = matrix(rnorm(consts$nLVs*consts$NTraits),nrow=consts$NTraits)
  varepsilonSTAR = mvtnorm::rmvnorm(consts$NSpecies,rep(0,consts$nLVs),diag(consts$nLVs))
  epsilonSTAR = mvtnorm::rmvnorm(consts$NSites,rep(0,consts$nLVs),diag(consts$nLVs))
  list(
    BSTAR = BSTAR,
    OSTAR = OSTAR,
    epsilonSTAR = epsilonSTAR,
    varepsilonSTAR = varepsilonSTAR,
    sd.Site = rexp(consts$nLVs),
    sd.Species = rexp(consts$nLVs),
    beta0 = rnorm(consts$NSpecies),
    sd.LV = rexp(consts$nLVs)
  )
}
We rotate the scores after then MCMC to make it more interpretable. Here is the code to do that, and an explanation of what is done.

Our model is

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]

But we can rotate by a rotation matrix \(M\) this to get an equivalent model:

\[ \eta_{ij} = \beta_{0j} + M \boldsymbol{z}_j^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_i. \]

We do this by rotating \(\boldsymbol{z}_j^\top \boldsymbol{\Sigma}\).

The covariance matrix is now \(M \Sigma^2 M^T\).

\[ \begin{align} Var(M \gamma^T \Sigma z) &= E((M z^T \Sigma \gamma)(\gamma \Sigma z^T M)^T)) \\ &= E(M \gamma^T \Sigma zz^T \Sigma^T \gamma M^T) = E(M \gamma^T \Sigma \Sigma^T \gamma M^T) \\ &= \Sigma^2E(M \gamma^T \gamma M^T) = \Sigma^2E(M M^T) \end{align} \]

There are many possible rotations (see the https://cran.r-project.org/web/packages/GPArotation/index.html package for a long list). Here we will just the the varimax rotation, which rotates the ordination so that most of the variation in \(\boldsymbol{\gamma}_j^\top \boldsymbol{\Sigma}\) is put on the first axis, then the second axis has the next largest amount of variance etc.

In detail, for each draw from the posterior we:

  1. calculate a rotation matrix \(M\) from \(\gamma \Sigma\)
  2. Rotate \(B, \epsilon, \gamma, z, O\) and \(\varepsilon\) to give \(MB, M\epsilon, M\gamma, Mz, MO\) and \(M\varepsilon\)
  3. Rotate (the standard deviations) \(\Sigma\) and the standard deviations of \(B, \epsilon, O\) and \(\varepsilon\) to give \(M \Sigma\) etc.
  4. Re-scale \(B, \epsilon, \gamma, z, O\) and \(\varepsilon\): for each latent variable divide by the standard deviation of \(\gamma\) or \(z\) as appropriate, so that \(Var(\gamma_l) = Var(z_l) =1\).
  5. Multiply the standard deviations by the standard deviation of \(\gamma\) or \(z\) as appropriate.

The final two steps sweep the variance of \(M \gamma \Sigma z\) into \(\Sigma\), so it still represents the total variance of the latent variables. (NOTE: I AM NOT SURE THIS IS RIGHT)

# Function to convert a variable in an MCMC chain that should be a matrix from 
# a vector to the right matrix
ChainToMatrix <- function(ch, name) {
  v <- ch[grep(paste0("^", name, "\\["), names(ch))]
  l.row <- as.numeric(gsub(".*\\[", "", gsub(",.*", "", names(v))))
  l.col <- as.numeric(gsub(".*, ", "", gsub("\\]", "", names(v))))
  mat <- matrix(0, nrow=max(l.row), ncol=max(l.col))
  for(i in seq_along(ch)) { # there must be a quicker way...
    mat[l.row[i], l.col[i]] <- v[i]
  }
  mat
}

# Function to convert a variable in an MCMC chain that should be a vector to a diagonal matrix
ChainToDiag <- function(ch, name) {
  v <- ch[grep(paste0("^", name, "\\["), names(ch))]
  mat <- diag(v)
  mat
}


# Function to rotate a latent variable, defaults to Varimax. Methods in GPArotation package
#  ch: one draw from MCMC chain
#  RotateBy: name of variable to rotate by (i.e. to calculate the rotation matrix for)
#  SDby: name of Standard deviations
#  ToRotate: Variables to rotate (e.g. epsilons, gamma/z).
# SDToRotate: Standatd devations to rotate
# rotation.fn: Function in GPArotation to calcualte rotation. Defaults to Varimax
# scale: Should RotateBy be scaled by SDby before rotating? Defaults to FALSE
rotate2LV <- function(ch, RotateBy, SDby, ToRotate=NULL, SDToRotate=NULL, rotation.fn="Varimax", 
                      scale=FALSE) {
  require(GPArotation)
  # Function to rotate matrices & return full vector or a matrix (retmat)
  RotateVar <- function(name, vec, rotmat, retmat=FALSE) {
    mat <- ChainToMatrix(ch=vec, name=name)
    rotated <- mat%*%rotmat
    
    if(retmat) {
      res <- rotated
    } else {
      vec[grep(paste0("^", name, "\\["), names(vec))] <- c(rotated)
      res <- vec
    }
    res
  }
  # Function to rotate (diagonal) matrices & return the diagonal
  RotateSD <- function(name, vec, rotmat) {
    mat <- diag(vec[grep(paste0("^", name, "\\["), names(vec))])
    rotated <- mat%*%rotmat
    
    vec[grep(paste0("^", name, "\\["), names(vec))] <- diag(rotated)
    vec
  }
  
  #Extract standard deviartions and latent variables
  SDs <- ch[grep(SDby, names(ch))]
  LVmat <- ChainToMatrix(ch, RotateBy)
  if(scale) LVmat <- sweep(LVmat, 2, SDs, "*")
  
  # Get rotation
  loadings.r  <- do.call(rotation.fn, list(A=LVmat)) # rotate
  
  # Rotate variables
  for(nm in unique(c(RotateBy, ToRotate))) {
    ch <- RotateVar(vec=ch, name=nm, rotmat=loadings.r$Th, retmat = FALSE)
  }
  for(nm in unique(c(SDby, SDToRotate))) {
    ch <- RotateSD(vec=ch, name=nm, rotmat=loadings.r$Th)
  }
  ch
}


RotateChains <- function(mcmc.lst, RotateBy="z", SDby = "sd.LV", 
                         ToRotate=c("B", "epsilon", "gamma", "z", "O", "varepsilon"), 
                         SDToRotate=c("sd.Site", "sd.Species"), ...) {
  rotated <- lapply(mcmc.lst, function(mcmc) {
    rot <- apply(mcmc, 1, rotate2LV, RotateBy=RotateBy, SDby = SDby, 
                         ToRotate=ToRotate, SDToRotate=SDToRotate, ...)
    as.mcmc(t(rot))
  })
  as.mcmc.list(rotated)
}

RescaleVars <- function(vec, ScaleBy, SDTorescale, ToRescale) {
  vec2 <- vec
  ScBy <- ChainToMatrix(vec, ScaleBy)
  SD <- apply(ScBy, 2, sd)
  
  for(nm in unique(c(ScaleBy, ToRescale))) {
      Sc.x <- ChainToMatrix(vec, nm)
      Sc.x.sc <- sweep(Sc.x, 2, SD, "/")
      vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- c(Sc.x.sc)
  }
  for(nm in SDTorescale) {
    SD.x <- vec[grep(paste0("^", nm, "\\["), names(vec))]
    vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- SD.x*SD
  }
  vec2
}

RescaleChains <- function(mcmc.lst, ...) {
  rescale <- lapply(mcmc.lst, function(mcmc) {
    rot <- apply(mcmc, 1, RescaleVars, ...)
    as.mcmc(t(rot))
  })
  as.mcmc.list(rescale)
}

We can run the fitting with the same code as before:

consts$nLVs <- nLVs <- 2
HO.2LV <- ParaNimble(4, fun = RunOneChain,
                       dat = dat,
                       code = HO.nim,
                       inits = inits, HMC=TRUE,
#                       Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
                       Nburn=1e3, NIter=3e3, Nthin=1, # for a full run
                       consts = consts)

# Varimax rotation

chains2LV.r <- RotateChains(HO.2LV, rotation.fn="Varimax")
## Loading required package: GPArotation
# Re-scale LVs etc.
chains2LV.r2 <- RescaleChains(chains2LV.r, ScaleBy="gamma", 
                              SDTorescale=c("sd.LV", "sd.Species"), 
                              ToRescale=c("gamma",  "O", "varepsilon"))
chains2LV.r3 <- RescaleChains(chains2LV.r2, ScaleBy="z", 
                              SDTorescale=c("sd.LV", "sd.Site"), 
                              ToRescale=c("z",  "B", "epsilon"))

# post-process chains for sign-swapping
chains2LV <- postProcess(chains2LV.r3, VarsToProcess=VarsToProcess, 
                         VarToSign = VarsToProcess[1:2])

# Calculate summaries
summ.HO2LV <- summary(chains2LV)

rm(chains2LV.r, chains2LV.r2, chains2LV.r3)

Results

First, we look at the mixing.

Environment first:

chainsPlot(chains2LV, var = c("B"), legend = F, traceplot=TRUE)

And the trait effects:

chainsPlot(chains2LV, var = c("O"), legend = F, traceplot=TRUE)

The traceplots look OK, although B[2,1] looks bimodal. This is not ideal, but we can explain why we think it is happening below by looking at the standard deviations:

Ratio <- chains2LV$chain1[,"sd.LV[1]"]/chains2LV$chain1[,"sd.LV[2]"]
Ratio.u <- HO.2LV$chain1[,"sd.LV[1]"]/HO.2LV$chain1[,"sd.LV[2]"]

SDs <- summ.HO2LV$statistics[grep("sd", rownames(summ.HO2LV$statistic)),]
rownames(SDs) <- c(paste0("Latent Variable ", 1:2), 
              paste0("Site ", 1:2),
              paste0("Species ", 1:2))
knitr::kable(SDs, digits=2, format = "html", table.attr = "style='width:70%;'")
Mean SD Naive SE Time-series SE
Latent Variable 1 0.64 0.11 0 0.01
Latent Variable 2 0.65 0.11 0 0.01
Site 1 0.36 0.36 0 0.02
Site 2 0.45 0.43 0 0.02
Species 1 0.03 0.04 0 0.00
Species 2 0.04 0.06 0 0.00

The two latent variables seem to explain similar amounts of variation, and a closer inspection (not shown) suggests that the second LV can explain more variance, which should not be happening.

In fact, if we plot B[2,1] and B[2,2] we see that the effect of the volume of lying CWD has an effect that flips between the first and second latent axes. The fact it is positive and negative suggests that the sign flipping is using a different variable, and is getting it wrong. Fixing the rotation should fix this.

plot(c(chains2LV$chain1[,"B[2, 1]"]), chains2LV$chain1[,"B[2, 2]"], type="n",
     xlab="B[2, 1]", ylab="B[2, 2]")
abline(v=0, lty=3); abline(h=0, lty=3); 
sapply(1:length(chains2LV), function(l, ch) 
  points(ch[[l]][,"B[2, 1]"], ch[[l]][,"B[2, 2]"], col=l, cex=0.3), ch=chains2LV)

Now, we can make two-dimensional ordination plots of sites and species, with their predictor effects. Note that these have been rotated, so that most ofthe variance should be on the first axis.

We need a function to plot the arrows, which is hidden here
AddArrows <- function(coords, marg= par("usr"), col=2) {
  origin <- c(mean(marg[1:2]), mean(marg[3:4]))
  Xlength <- sum(abs(marg[1:2]))/2
  Ylength <- sum(abs(marg[3:4]))/2
  ends <- coords / max(abs(coords)) * min(Xlength, Ylength) * .8
  arrows(
    x0 = origin[1],
    y0 = origin[2],
    x1 = ends[,
              1] + origin[1],
    y1 = ends[, 2] + origin[2],
    col = col,
    length = 0.1)
  text(
    x = origin[1] + ends[, 1] * 1.1,
    y = origin[2] + ends[, 2] * 1.1,
    labels = rownames(coords),
    col = col)
  
}
ExtractMeans <- function(mcmc.lst, name=NULL) { # this needs defensive programming
  if(is.null(name)) {
    ind <- 1:nrow(mcmc.lst$statistics)
  } else {
    ind <- grep(name, rownames(mcmc.lst$statistics))
  }
  Means <- mcmc.lst$statistics[ind,"Mean"]
  if(any(grep(",", names(Means)))) {
    Col <- gsub("]", "", unique(gsub(".*, ", "", names(Means))))
    res <- sapply(Col, function(cc, mn) {
      mn[grep(paste0(cc,"]"), names(mn))]
    }, Means)
  } else {
    res <- Means
  }
  res
}
GetMeans <- function(summ, name, d=1) {
  if(is.null(d)) d <- 1
  var <- summ$statistics[grep(paste0('^', name), rownames(summ$statistics)),"Mean"]
  matrix(var,ncol=d)
}


# LVMeans <- ExtractMeans(LVs, name = "^LV\\[")
# gammaMeans <- ExtractMeans(gamma, name="^gamma")

z.m <- GetMeans(summ.HO2LV, name="^z", d=consts$nLVs)
gamma.m <- GetMeans(summ.HO2LV, name="^gamma", d=consts$nLVs)
vareps.m <- GetMeans(summ.HO2LV, name="^varepsilon", d=consts$nLVs)
eps.m <- GetMeans(summ.HO2LV, name="^epsilon", d=consts$nLVs)
B.m <- GetMeans(summ.HO2LV, name="B", d=consts$nLVs)
O.m <- GetMeans(summ.HO2LV, name="O", d=consts$nLVs)

par(mfrow=c(2,1))
#LVs
plot(z.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Sites")
text(z.m,labels=1:consts$NSites)
AddArrows(marg = par("usr"), coords=B.m, col="red")

#gammas
plot(gamma.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Species")
text(gamma.m,labels=vegan::make.cepnames(colnames(Y)))
AddArrows(marg = par("usr"), coords=O.m, col="blue")

We can see that almost all of the environmental effects seem to act in the same way, mainly on LV1.

We can also look at some summary statistics for the scale parameters of the latent variables, species-specific residuals and site-specific residuals, respectively:

These tell us how well the predictors explain the ordination; the species- and site-specific scale parameters would be zero if the predictors fully explained the ordination. The scale parameters for the latent variables are similar to singular values (the square root of eigenvalues) in a classical ordination; they reflect a dimension its importance to the response.

Residual covariance matrix

The covariance of species is determined by traits, LV-specific variation and sites’ residual variation, and the covariance between sites is determined by the environment, LV-specific variation and species’ residual variation.

The maths behind this is covered in here

The residual covariance matrix of the model (where you calculate species associations from) is determined by three terms:

  1. \(\boldsymbol{X}\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j \sim \mathcal{N}(0,\boldsymbol{X}\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{X}^\top)\)
  2. \(\boldsymbol{TR}\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i\sim \mathcal{N}(0,\boldsymbol{T}\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{T}^\top)\)
  3. \(\boldsymbol{\epsilon}_i^\top\boldsymbol{\varepsilon}_j \sum \limits^{d}_{q=1}\Sigma_{q,q}\sigma_q\delta_q\sim \mathcal{K}_0(\sigma^{-1}_q\delta^{-1}_q\vert\epsilon_{iq}\varepsilon_{iq}\vert)\),

where \(\mathcal{K}_0\) denotes the zero order modified Bessel function of the second kind. The first term tells us that correlations between species are determined by the environment. The second term tells us that the correlation between sites is determined by species traits. The last term induces no correlation between sites and species for diagonal \(\boldsymbol{\Sigma}\), but serves to scale species associations. Consequently, the covariance for species \(j,j2\) and site \(i,i2\) is: \[\begin{multline} \Sigma_{i,i2,j,j2} = \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \\ \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j},\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2})+ \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}), \end{multline}\] third order terms are zero for central normal random variables, so this simplifies to: \[\begin{equation} \Sigma_{i,i2,j,j2} = \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2}+ \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2})\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}). \end{equation}\] Here, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}) = \text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2}) = 0\), and for \(\text{cov}(\boldsymbol{\epsilon}_i^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) = 2\text{tr}(\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma})\). Further, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2})\) is zero for \(j \neq j2\) and \(\text{diag}(\boldsymbol{\delta}^2)\) otherwise, and similar for \(\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2})\) .

Consequently, for a species association matrix where we consider the block of the covariance matrix where \(i = i2\), or for the site-to-site matrix where we consider the block \(j = j2\), we have:

\[\begin{split} \Sigma_{j,j2} &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}) + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{var}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} \\ &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2}+ 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}, \end{split}\]

and

\[\begin{equation} \Sigma_{i,i2} = \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j} + \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}. \end{equation}\]

We can visualize those matrices here for (e.g.,) the first site and first species, respectively.

#species
spec.mat <- matrix(0,nrow=NSpecies,ncol=NSpecies)
Sigma <- diag(SDs[grep("Latent Variable", rownames(SDs)),1]^2)
Sigma.sp <- diag(SDs[grep("Species", rownames(SDs)),1]^2)
Sigma.si <- diag(SDs[grep("Site", rownames(SDs)),1]^2)

for(j in 1:NSpecies){
  for(j2 in 1:NSpecies){
    if(j==j2){
      spec.mat[j,j2] = X[1,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%t(B.m)%*%t(X[1,,drop=F])
    }
    spec.mat[j,j2] = spec.mat[j,j2] + TR[j,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%Sigma%*%t(O.m)%*%t(TR[j2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
  }
}

spec.cor.mat <- cov2cor(spec.mat)
colnames(spec.cor.mat) <- row.names(spec.cor.mat) <- colnames(Y)
corrplot::corrplot(spec.cor.mat,type = "lower",order = "AOE", main = "Species", mar = c(1,1,1,1),tl.srt=45,tl.cex = .5)

#sites
site.mat <- matrix(0,nrow=NSites,ncol=NSites)

for(i in 1:NSites){
  for(i2 in 1:NSites){
    if(i==i2){
      site.mat[i,i2] = TR[1,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%t(O.m)%*%t(TR[1,,drop=F])
    }
    site.mat[i,i2] = site.mat[i,i2] + X[i,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%Sigma%*%t(B.m)%*%t(X[i2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
  }
}

site.cor.mat <- cov2cor(site.mat)
colnames(site.cor.mat) <- row.names(site.cor.mat) <- paste("Site", 1:NSites)
corrplot::corrplot(site.cor.mat,type = "lower",order = "AOE", main = "Sites", mar = c(3,3,3,3),tl.srt=45,tl.cex = .5)